ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
colucix, sorry about the confusion, your result is spot-on - see #13.
I am not interested in the interim files. They were just a cludge for my 4-step solution. The only thing I need is appending the new ids to 2 files (as per my 1st post).
I need to sit down and work through your awk. I'd like to understand how it does it so I know better next time.
Click here to see the post LQ members have rated as the most helpful post in this thread.
colucix, sorry about the confusion, your result is spot-on - see #13.
Ok. Just seen the modification.
Quote:
Originally Posted by hashbang#!
I need to sit down and work through your awk. I'd like to understand how it does it so I know better next time.
Just take in mind that FILENAME is an internal variable storing the name of the file currently parsed. When you pass multiple filenames as argument, awk processes all of them in sequence and the FILENAME variable changes accordingly. For this reason, we have two rules in the code: the first one si executed for all the ids?????? file, the second one only for the idsmore file.
Can you post the code to generate those, because I can't imagine in my head what needs to be done.
You can simply add more print statements at your pleasure. Since the code does not require sorting (as per the first lines of the OP's script), there is no need for the sorted temporary files... or were you really interested in how to perform sorting in awk?
I was asking about the test files. I cannot write a solution without test files. You said you somehow made 5 files containing 56000 different ids each (total 280000) and a file "idsmore" containing 300000 different ids, how ? I don't really need that many. Or, if the problem is solved, just forget about it.
Ops, sorry... I totally misunderstood your post. I generated 280000 numbers between 1 and 1000000 using the following awk code:
Code:
BEGIN {
srand()
do {
num = 1 + int(rand() * 999999)
if (! (num in _)) {
print num
_[num] = ""
count++
}
} while ( count < 280000 )
}
then I cutted the output in five pieces using the split command and finally I added 20000 more numbers (plus dates) to generate idsmore. But for testing purposes you may also generate numbers in sequence between 1 and 280000 using seq.
Am I understanding this correctly, that grep -F speeds up the operation because it does not try to interpret the patterns as regular expressions?
Yes, when grep knows that the patterns aren't regular expressions it can use the hash table approach, suggested by some posters here, internally.
Quote:
I get the same result without -F but it crawls. fgrep is a fraction of a second faster than grep -F.
You have to expect some random variation between different runs of the same program.
Quote:
egrep is the same as grep -E. fgrep is the same as grep -F. Direct invocation as either egrep or fgrep is deprecated, but is provided to allow historical applications that rely on them to run unmodified.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.